Skip to content

Conversation

@martin-gpy
Copy link
Contributor

During nvme connect-all, if a discovery log page record reports the sectype as anything other than NVMF_TCP_SECTYPE_NONE in nvmf_connect_disc_entry(), it then assumes that tls should be default set for the same. But this holds true only for configured PSK TLS connections alone and not generated PSK TLS (i.e. secure channel concat) connections since both concat and tls flags are meant to be mutually exclusive. Fix the same.

Fixes: #3034

@igaw
Copy link
Collaborator

igaw commented Jan 5, 2026

I think the nvmf_connect_disc_entry function should not blindly enable tls in this case.

Something like this here:

	if (e->trtype == NVMF_TRTYPE_TCP &&
	    e->tsas.tcp.sectype != NVMF_TCP_SECTYPE_NONE &&
	    !cfg->tls)
		c->cfg.tls = true;

If we check the command line for --concat and --tls and ensure that the user doesn't provide both options, then this here would be a good place to log what is happening. It seems relevant to report to me. What do you think?

@martin-gpy
Copy link
Contributor Author

martin-gpy commented Jan 5, 2026

I think the nvmf_connect_disc_entry function should not blindly enable tls in this case.

Yes, I agree. This code snippet at nvmf_connect_disc_entry() is really unwarranted IMO. But anyways...

Something like this here:

	if (e->trtype == NVMF_TRTYPE_TCP &&
	    e->tsas.tcp.sectype != NVMF_TCP_SECTYPE_NONE &&
	    !cfg->tls)
		c->cfg.tls = true;

If we check the command line for --concat and --tls and ensure that the user doesn't provide both options, then this here would be a good place to log what is happening. It seems relevant to report to me. What do you think?

This won't help because c->cfg.tls (or even c->cfg.concat for that matter) is always false at this point while parsing the discovery log page record at nvmf_connect_disc_entry(). IIRC, the user provided options for these flags are updated only in nvmf_add_ctrl() during the merge_config().

That's the reason I made this fix in nvmf_add_ctrl() itself in the above commit, and not in nvmf_connect_disc_entry().

@martin-gpy
Copy link
Contributor Author

Or should we just do away with the current code snippet of enabling tls if sectype is != NVMF_TCP_SECTYPE_NONE in nvmf_connect_disc_entry(), so that we completely rely on the user provided options itself for the same? That would be simple and straightforward. And then like you said, throw an additional error if tls and concat are invoked together.

What do you say?

@igaw
Copy link
Collaborator

igaw commented Jan 7, 2026

There was already some changes on this condition:

  • 3962a45 ("fabrics: add fabrics config option 'tls'")
  • 1f5db47 ("fabrics: use SECTYPE to determine whether to use TLS"

After reading up on the SECTYPE in the spec, this condition seems to implement the Host Configuration setting for TLS Required. So I don't think we can just drop it. Need to read more how the concat stuff is supposed to work together with TLS as next. Tricky stuff.

BTW, I think we should add a blktests for this scenario.

@hreinecke @calebsander

@martin-gpy
Copy link
Contributor Author

martin-gpy commented Jan 7, 2026

After reading up on the SECTYPE in the spec, this condition seems to implement the Host Configuration setting for TLS Required. So I don't think we can just drop it. Need to read more how the concat stuff is supposed to work together with TLS as next. Tricky stuff.

Then why not go with the current fix itself in the above commit at d1377a7? That way, we retain the existing check of default enabling --tls whenever sectype is != NVMF_TCP_SECTYPE_NONE in the discovery log page record, but at the same time ensuring --tls is turned off whenever --concat is invoked. That's also simple and straightforward and avoids all this confusion in the first place. After all, both --tls and --concat flags are used for creating TLS connections itself, but it's just that they correspond to mutually exclusive TLS features (--tls is used for configured PSK TLS whereas --concat is used for generated PSK TLS) with separate code paths.

@martin-gpy
Copy link
Contributor Author

martin-gpy commented Jan 7, 2026

I was digging more into this, and looks like there is a simple solution for this issue.

For configured PSK, the treq field in the discovery log page record is set to required whereas it is set to not required for generated PSK. So we can use that field to distinguish between the two here and set the appropriate flags for the same. Sample patch below:

        if (e->trtype == NVMF_TRTYPE_TCP &&
-           e->tsas.tcp.sectype != NVMF_TCP_SECTYPE_NONE)
-               c->cfg.tls = true;
+           e->tsas.tcp.sectype != NVMF_TCP_SECTYPE_NONE) {
+               if (e->treq & NVMF_TREQ_REQUIRED)
+                       c->cfg.tls = true;
+               else if (e->treq & NVMF_TREQ_NOT_REQUIRED)
+                       c->cfg.concat = true;
+       }

@martin-gpy
Copy link
Contributor Author

@hreinecke - Could you please share your comments on this?

@igaw
Copy link
Collaborator

igaw commented Jan 9, 2026

image

If I read it correctly, the TSC value not supported or support but not required is not automatically mandating concatenation. Altough when TASC is it seems asking for concat. Do you have a pointer where not supported means concatenation should be used?

@martin-gpy
Copy link
Contributor Author

martin-gpy commented Jan 9, 2026

If I read it correctly, the TSC value not supported or support but not required is not automatically mandating concatenation. Altough when TASC is it seems asking for concat. Do you have a pointer where not supported means concatenation should be used?

This is the spec interpretation following the discussion with Hannes:

Configuration Type TASC TSC SECTYPE
Configured PSK TLS with inband auth disabled 00b (Not Specified) 01b (Required) 02 (TLS 1.3)
Configured PSK TLS with inband auth enabled 01b (Auth Required) 01b (Required) 02 (TLS 1.3)
Generated PSK TLS (with inband auth) 10b (Auth with secure concat) 10b (Not Required) 02 (TLS 1.3)

So for generated PSK TLS (i.e. concat), it doesn't matter whether TSC is supported or not supported, but the value should be 10b (Not Required).

@martin-gpy
Copy link
Contributor Author

martin-gpy commented Jan 9, 2026

So effectively, we can ignore TASC for now and focus only on the TSC value here (within the TREQ field) for distinguishing between the two TLS types - TSC being 01b (Required) for configured PSK TLS, and 10b (Not Required) for generated PSK TLS (i.e. secure channel concat) respectively.

So given enum NVMF_TREQ_REQUIRED value is set to 1 and NVMF_TREQ_NOT_REQUIRED value is set to 2, the above check at 597fc46 should already hold good for distinguishing between the two:

if (e->treq & NVMF_TREQ_REQUIRED)
           c->cfg.tls = true;
else if (e->treq & NVMF_TREQ_NOT_REQUIRED)
           c->cfg.concat = true;

Right?

@martin-gpy
Copy link
Contributor Author

Any updates here?

Without this fix, one is unable to secure concat TLS connect across several subsystems at one go using the JSON config file. Instead, one has to manually connect to each subsystem here one at a time, which is a big pain.

@igaw
Copy link
Collaborator

igaw commented Jan 14, 2026

I've started to look into it but run out of time.

@martin-gpy
Copy link
Contributor Author

Tested nvme connect-all successfully using the JSON config file with 597fc46 applied for both configured PSK TLS & generated PSK TLS respectively. No issues here. Let me know if you want me to attach the logs.

@igaw
Copy link
Collaborator

igaw commented Jan 15, 2026

I assume you are testing with the 2.x branch, as the current master will crash when entering the discover path. Also figured out that we do not really handle the config parsing correctly for the tls/concat parameters (at least in master). There is a lot of cleanup necessary and that's also why I said I wont take any patches for fabrics until the cleanup has been done. It's a huge mess at the moment.

Unfortunately, I can't test your changes with nvmet, as it seems it lacks the support for this type of configuration. Though, though the change is very contain and doesn't add any other random hack, I'll take it.

Another thing I started to wonder: should we enforce tls 1.3 and drop support tls 1.2?

igaw and others added 2 commits January 15, 2026 10:28
The discovery controller is using the identify command and also the get
log page command. Thus, create the transport handle for the discovery
controller.

Signed-off-by: Daniel Wagner <[email protected]>
During nvme connect-all, if a discovery log page record reports the
sectype as anything other than NVMF_TCP_SECTYPE_NONE in
nvmf_connect_disc_entry(), it then assumes that --tls should be default
set for the same. But this holds true only for configured PSK TLS alone
and not for generated PSK TLS. For generated PSK TLS connections using
--concat (i.e. secure channel concat), this would lead to connection
failures since both --tls and --concat are not to be invoked together.

Fix this by distinguishing the two through their respective treq values
and setting the appropriate --tls or --concat flags for each.

Signed-off-by: Martin George <[email protected]>
@igaw igaw force-pushed the concat_connect_all branch from 597fc46 to c912140 Compare January 15, 2026 09:28
@igaw igaw merged commit 3f2ca96 into linux-nvme:master Jan 15, 2026
20 checks passed
@martin-gpy
Copy link
Contributor Author

martin-gpy commented Jan 15, 2026

I assume you are testing with the 2.x branch, as the current master will crash when entering the discover path. Also figured out that we do not really handle the config parsing correctly for the tls/concat parameters (at least in master). There is a lot of cleanup necessary and that's also why I said I wont take any patches for fabrics until the cleanup has been done. It's a huge mess at the moment.

Weird. I have been using the master branch itself, but didn't encounter any such crash in the discovery path. Could it be because I'm running this on a relatively old kernel like SLES15 SP7 MU (i.e. kernel-default-6.4.0-150700.53.25.1) ?

I understand there is a lot of cleanup required here in the master branch. But in context of --concat, I wanted to add some new error messages as you had suggested (like erroring out if --tls and --concat are invoked together, etc.). Shall I still go ahead with a new pull request for the same in libnvme fabrics, or do you want me to wait till this cleanup is complete?

Unfortunately, I can't test your changes with nvmet, as it seems it lacks the support for this type of configuration. Though, though the change is very contain and doesn't add any other random hack, I'll take it.

Thanks for merging this. As said above, it solves a big overhead by enabling multiple connects to multiple subsystems via the JSON file.

Another thing I started to wonder: should we enforce tls 1.3 and drop support tls 1.2?

Yes, I think that makes sense because Section 3.6.1.1 Transport Specific Address Subtype: TLS of the NVMe/TCP Transport Specification 1.2 explicitly states:

"TLS version 1.2 should not be used with NVMe/TCP."

@hreinecke - do you concur?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

concat connections fail during connect-all

2 participants